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Abstract 

In this paper, we address the global and preemptive 
energy-aware scheduling problem of sporadic constrained- 
deadline tasks on DVFS-identical multiprocessor platforms. 
We propose an online slack reclamation scheme which prof- 
its from the discrepancy between the worst- and actual-case 
execution time of the tasks by slowing down the speed of the 
processors in order to save energy. Our algorithm called 
MORA takes into account the application-specific con- 
sumption profile of the tasks. We demonstrate that MORA 
does not jeopardize the system schedulability and we show 
by performing simulations that it can save up to 32% of en- 
ergy (in average) compared to execution without using any 
energy-aware algorithm. 



1. Introduction 

Context of the study. Nowadays, many modem proces- 
sors can operate at various supply voltages, where different 
supply voltages lead to different clock frequencies and to 
different processing speeds. Since the power consumption 
of a processor is usually a convex and increasing function 
of its speed, the slower its speed is, the less its consumption 
is ifTTl . Among the most recent and popular such proces- 
sors, one can cite the Intel PXA27x processor family |2T1 . 
used by many PDA devices ll20l . 

Many computer systems, especially embedded sys- 
tems, are now equipped with such voltage (speed) scal- 
ing processors and adopt various energy-efficient strate- 
gies for managing their applications intelligently. More- 
over, many recent energy-constrained embedded systems 
are built upon multiprocessor platforms because of their 
high-computational requirements. As pointed out in [10 
[TT1 . another advantage is that multiprocessor systems are 
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more energy efficient than equally powerful uniprocessor 
platforms, because raising the frequency of a single proces- 
sor results in a multiplicative increase of the consumption 
while adding processors leads to an additive increase. 

Supported by this emerging technology, the Dynamic 
Voltage and Frequency Scaling (DVFS) |[T5l framework 
becomes a major concern for multiprocessor power-aware 
embedded systems. For real-time systems, this framework 
consists in reducing the system energy consumption by ad- 
justing the working voltage and frequency of the processors, 
while respecting all the timing constraints. 

Previous work. There are a large number of researches 
about the Mm'processor energy-aware real-time scheduling 
problem ||5] 13122 23 32 1. Among those, many slack recla- 
mation approaches have been developed over the years. 
Such techniques dynamically collect the unused computa- 
tion times at the end of each early task completion and share 
it among the remaining pending tasks. Examples of such 
approaches include the ones proposed in J5] [221 ES l33l . 
Some reclaiming algorithms even anticipate the early com- 
pletion of tasks for further reducing the CPU speed l5l l27ll . 
some having different levels of "aggressiveness" Q. 

In lfT31 . Kuo et al. propose a state-of-art about energy- 
aware algorithms in mMZf/processor environment. As it is 
mentioned in this state-of-art, many studies (see for in- 
stance Hi] [16] [TTl US IH ED) consider the frame-based 
task model, i.e., all the tasks share a common deadline and 
this "frame" is indefinitely repeated. Among the most in- 
teresting studies which consider this task model, Zhu et 
al. l34ll explored online slack reclamation schemes (i.e., 
running during the system execution) for dependent and 
independent tasks. In |[T8l , Kuo et al. propose a set of 
energy-efficient scheduling algorithms with different task 
remapping and slack reclamation schemes. In IfTTl . the au- 
thors address independent tasks, where task migrations are 
not allowed. In [14], the authors provide some techniques 
with and without allowing task migration, while assuming 
that tasks share the same power consumption function and 
each processor may run at a selected speed, independently 
from the speeds of the others. In [16], the authors con- 
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sider that tasks are allowed to have different power con- 
sumption functions. In PP . energy-aware multiprocessor 
scheduling of frame-based tasks was explored for multipro- 
cessor architectures, in which all the processors must share 
the same speed at any time. Finally the authors of |[T3l 
propose a slack reclamation scheme for identical multipro- 
cessor platforms, while considering frame-based tasks of 
which the distribution of the computation times is assumed 
to be known. 

Targeting a sporadic task model, Anderson and 
Baruah [3| explored the trade-off between the total energy 
consumption of task executions and the number of required 
processors, where all the tasks run at the same common 
speed. In previous work ll26ll . we provided a technique 
that determines the minimum common offline speed for ev- 
ery task under global-EDF policy Q, while considering 
identical multiprocessor platforms. Furthermore, we pro- 
posed in the same study an online algorithm called MOTE 
which was, to the best of our knowledge, the first to address 
the global and preemptive energy-aware scheduling prob- 
lem of sporadic constrained-deadlines tasks on multiproces- 
sors. The main idea of MOTE is to anticipate at run-time 
the coming idle instants in the schedule in order to reduce 
the processors speed accordingly. This algorithm cannot be 
considered as a slack reclamation scheme since it does not 
directly take advantage from early tasks completion, but it 
can be combined with slack reclaiming techniques (and in 
particular with MORA) in order to improve the energy sav- 
ings. 

Contribution of the paper. In this paper, we propose a 
slack reclamation scheme called MORA for the global and 
preemptive energy-aware scheduling problem of sporadic 
constrained-deadline real-time tasks on a fixed number of 
DVFS-capable processors. According to |[T5l and to the 
best of our knowledge, this is the first work which addresses 
a slack reclamation scheme in this context. Although 
most previous studies on multiprocessor energy-efficient 
scheduling assumed that the actual execution time of a task 
is equal to its Worst-Case Execution Time (WCET), such 
that those in 12 [6] [14J |3TI for instance, this work is mo- 
tivated by the scheduling of tasks in practice, where tasks 
might usually complete earlier than their WCET UJ [34l . 
The proposed algorithm MORA is an online scheme which 
exploits early task completions by using as much as possi- 
ble the unused time to reduce the speed of the processors. 
Although it has been inspired from the uniprocessor "Dy- 
namic Reclaiming Algorithm" (DRA) proposed in [5|, the 
way in which it profits from the unused time is very dif- 
ferent from the DRA since MORA takes into account the 
application-specific consumption profile of the tasks. 

Organization of the paper. The document is organized 
as follows: in Section |2j we introduce our model of com- 
putation, in particular our task and platform model; in Sec- 



Processor Type 


Intel XScale Q] 


Frequency (MHz): 


150 


400 


600 


800 


1000 


Speed: 


0.15 


0.4 


0.6 


0.8 


1.0 


Voltage (V) 


0.75 


1.0 


1.3 


1.6 


1.8 


Power in run mode (mW): P(sk) 


80 


170 


400 


900 


1600 


Power in idle mode (mW): Pidie 


40 



Table 1. Intel XScale characteristics 



tion [3] we present our online slack reclamation technique 
called MORA and we prove its correctness; in Section [4] 
we present our simulation results and in Section |5J we in- 
troduce future research directions and we conclude. 

2. Model of computation 
2.1. Platform model 

We consider multiprocessor platforms composed of a 
known and fixed number m of UVFS-identical processors 
{Vi,V 2 , ■ ■ ■ , V m }- "DVFS -identical" means that (i) all the 
processors have the same profile (in term of consumption, 
computational capabilities, etc.) and are interchangeable, 
(ii) two processors running at a same frequency execute 
the same amount of execution units, and (iii) all the pro- 
cessors have the same minimal and maximal operating fre- 
quency denoted by / m j n and / max , respectively. The pro- 
cessors are referred to as independent, with the interpre- 
tation that they can operate at different frequencies at the 
same time |29ll24l . Furthermore, we assume that each pro- 
cessor can dynamically adapt its operating frequency (and 
voltage) at any time during the system execution, indepen- 
dently from each other. The time overheads on frequency 
(voltage) switching are assumed to be negligible, such as in 
many researches Rl l9l l25l l32l l35l 

We define the notion of speed s of a processor as the ra- 
tio of its operating frequency / over its maximal frequency, 

i.e.: s = f y~ with the interpretation that a job that ex- 
ecutes on a processor running at speed s for R time units 
completes sx R execution units. When only K discrete fre- 
quencies are available to a processor, they are sorted in the 
increasing order of frequency and denoted by /i, . . . ,/#-. 
For each frequency fk such that 1 < k < K, we denote 

by Sk the corresponding speed (i.e., = f tA-) and by 
P(sfc) the power consumption (energy consumption rate) 
per second while the processor is running at speed Sk- The 
available frequencies and the corresponding core voltages 
of the Intel XScale processor [ 1 ] that will be used in our 
experiments are outlined in Table 1 . Notice that, from our 
definition of the processor speed, s max is fy^ = 1 what- 
ever the considered processor. Moreover, due to the finite 
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number of speeds that are available to any practical pro- 
cessor, any speed s computed by any energy-aware algo- 
rithm must be translated into one of the available speeds. 
In this work, this translation is performed by the function 

S(s) d = min{s, | s. t > s}. 

2.2. Application model 

A real-time system r is a set of n functionalities de- 
noted by {n, T2, ■ • • ,T n }. Every functionality r, is mod- 
eled by a sporadic constrained-deadline task characterized 
by three parameters (Cj, Di,Ti) - a Worst-Case Execu- 
tion Time (WCET) Ci at maximal processors speed s max 
(expressed in milliseconds for instance), a minimal inter- 
arrival delay Ti and a relative deadline Di < Tj - with 
the interpretation that the task Tj generates successive jobs 
Ti t j (with j = 1, . . . , oo) arriving at times a,j such that 
a ij > a,j_i + Ti (with a,,x > 0), each such job has a 
worst-case execution time of at most C, time units (at max- 
imal processors speed s max ), and must be completed at (or 

dcf 

before) its absolute deadline noted Dij = a,i,j + Di. Ac- 
cording to our definition of the processors speed, a proces- 
sor running at speed s max = 1 may take up to C\ time units 
to complete a job nj and, at a given speed s, its WCET is 
— . Notice that, since Di < T,-, successive jobs of any task 
Tj do not interfere with each other. 

We define the density Si of the task Tj as the ratio of 
its WCET at maximal speed s max over its deadline, i.e., 

Si = f jj-. We assume that this ratio is not larger than 1 
for every task, since a task with a density larger than 1 is 
never able to meet its deadlines (since task parallelism is 
forbidden in this work). The maximal density 5 max (r) of the 

system is defined as S max (r) = f max™ =1 {(5i} and its total 

density is defined as <5 sum (r) = f Y17=i <V ^ n our study, 
all the tasks are assumed to be independent, i.e., there is 
no communication, no precedence constraint and no shared 
resource (except the processors) between them. 

At any time t in any schedule S, a job Tjj is said to be 
active iff djj < t and it is not completed yet in S. More- 
over, an active job is said to be running at time t in S if it is 
executing on a processor. Otherwise, the active job is pend- 
ing in a ready-queue of the operating system and we say that 
it is waiting. Furthermore, a job is said to be dispatched at 
time t in S if it passes from the waiting state to the running 
state at time t. 

Although certain benchmarks provide measured power 
consumption, we should not ignore that different applica- 
tions may have different instruction sequences and require 
different function units in the processor, thus leading to dif- 
ferent dynamic consumption profiles. As it was already 
done in ll30ll . we hence introduce a measurable parame- 
ter ei for each task t.- l that reflects this application-specific 



power difference between the applications and the mea- 
sured benchmark. Accordingly, the consumption of any 
task Ti executed for 1 time unit at speed s% can be estimated 
by e, ; • (P(s fc ) - Pjdie) + Pidie ED, where P(a) and P idlc 
are defined as in Table 1. In the remainder of this paper, 
we denote by P;(P, Sk) the energy consumed by the task r, 
when executed for R time units at speed Sfe and we define it 

as Ei(R, s k ) d = R- (ej • (P(s fe ) - P id i ) + Pdle)- As we 
will see in Section [33] MORA uses these energy consump- 
tion functions in order to improve the energy saving that it 
provides. This improvement makes MORA very different 
from the Mmprocessor dynamic reclaiming algorithm DRA 
proposed in Q. 

2.3. Scheduling specifications 

We consider in this study the global scheduling problem 
of sporadic constrained-deadlines tasks on multiprocessor 
platforms. "Global" scheduling algorithms, on the contrary 
to partitioned algorithms, allow different tasks and differ- 
ent jobs of the same task to be executed upon different pro- 
cessors. Furthermore, we consider preemptive scheduling 
and Fixed Job-level Priority assignment (FJP), with the fol- 
lowing interpretations. In the preemptive global scheduling 
problem, every job can start its execution on any processor 
and may migrate at run-time to any other processor if it gets 
meanwhile preempted by a higher-priority job. We assume 
in this paper that preemptions are carried out with no loss 
or penalty. Fixed Job-level Priority assignment means that 
the scheduler assigns a priority to jobs as soon as they arrive 
and every job keeps its priority constant until it completes. 
Global Deadline Monotonic and Global Earliest Deadline 
First Q are just some examples of such scheduling algo- 
rithms. 

3. The Multiprocessor Online Reclaiming Al- 
gorithm (MORA) 

3.1. Notations 

During the system execution, every active job Tij has 
two associated speeds noted Sij and s° *J . The speed Sij de- 
notes the speed that a processor adopts while executing T^j. 
We assume that these execution speeds Sjj can be modified 
at any time during the system execution, even during the ex- 
ecution of Ti.j, and it is instantaneously reflected on the pro- 
cessor speed. On the other hand, the speed is the offline 
precomputed execution speed of Tij, in the sense that the 
value of Si.j is always set to sf* at t^j arrival time. These 
offline speeds s°*J are determined before the system execu- 
tion and remain always constant at run-time. They may be 
simply set to the maximal processors speed s max , or they 
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can be determined by an offline energy-aware strategy, such 
that the one proposed in |26| for instance. These offline 
speeds must ensure that all the deadlines are met when the 
set of tasks is scheduled upon the m processors, even if ev- 
ery job of every task presents its WCET. Notice that, since 
each task generates an infinity of jobs, the method proposed 
in J26 1 determines a common speed for every task and as- 
sumes that every job Tij inherits from the offline speed of 
r, at run-time. 

MORA is based on reducing online (i.e., while the sys- 
tem is running) the execution speed Sjj of the jobs in order 
to provide energy savings while still meeting all the dead- 
lines. To achieve this goal, MORA detects whenever the 
speed Sjj of an active job nj can safely be reduced by per- 
forming comparison between the schedule which is actually 
produced (called the actual schedule hereafter) and the of- 
fline schedule defined below. We will see in the remainder 
of this section that our algorithm MORA always refers to 
this offline schedule in order to produce the actual one. 

Definition 1 (The offline scheduie) The offline schedule is 
the schedule produced by the considered scheduling algo- 
rithm on which every job of every task r< runs at its offline 
speed s° S j and presents its WCET. 

Figure |T|(a) depicts an example of an offline schedule 
and illustrates the notations that will be used throughout the 
paper. In this picture, a 5-tasks system is executed upon 2 
processors, where only the first job of each task is repre- 
sented. The characteristics of the tasks are the following 
(remember that n = (Q, D u Tj)): t x = (6, 14,30), t 2 = 
(6,15,35),t 3 = (8,16,40),t 4 = (2,17,45) and r 5 = 
(6, 18, 50). Assuming Global-EDF, we have the following 
priority order: n,i > T2,i > Tz,i > Tn,i > 7fc,i. Further- 
more, we assume in this example that the offline speed s° ^ 
of every job Tij is the maximal processors speed s max = 1. 



(a) Offline schedule. 
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(b) Actual schedule. 



it to Vi in the actual one. That is, assuming the same set 
of tasks as in Figure [T|(a), Figure [T](b) depicts the actual 
schedule that is produced if the actual execution time of 
the jobs Txi, ■ ■ ■ ■> r 5.i are respectively 3, 2, 3, 2, 6. At any 
time f, we denote by rem,-^- (i) and rem^(t) the worst- 
case remaining execution time of job Tij at speed s max in 
the actual and offline schedule, respectively. We assume 
that these quantities are updated at run-time for every active 
job Tij. For instance in Figure [T] at time t — 3, we have 
rem! ^(3) = (since ri i completes at time t = 3 in the 
actual schedule) and rem° f |(3) = 3. Notice that from our 
definition of a processor speed, the worst-case remaining 
execution time of job Tij at speed s in the actual and offline 

off^ 

respectively. We denote 
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by disp^ j(t) the earliest time at which Tij is dispatched in 
the offline schedule, when only the set of active jobs at time 
t in the offline schedule are considered. For instance in Fig- 
ure [T](a) we have disp 41 (0) = 6 and disp 51 (0) = 8. Fi- 
nally, nextdisp^^, t) denotes the earliest instant after time 
t at which a job which is not completed in the actual sched- 
ule at time t is dispatched to Vt in the offline schedule. 
Again, only the set of active jobs at time t in the offline 
schedule are considered to compute nextdisp^^, t). For 
instance in Figure |T|(a) we have nextdisp('p2, 2) = 6 and 
nextdisp(/ c 'i, 3) = 6. 

3.2. The a-queue 

Since the jobs arrival times are unknown while consid- 
ering the sporadic task model, computing and storing the 
entire offline schedule cannot be done before the system ex- 
ecution. Hence, our algorithm only stores and updates at 
run-time a sufficient part of the offline schedule. This kind 
of approach (i.e., using a dynamic data structure for em- 
bodying a sufficient part of the offline schedule) was previ- 
ously proposed in 0. As in 0, we call this data structure 
a-queue. The a-queue is a list that contains, at any time t, 
the worst-case remaining execution time rem°*(£) of every 
active jobs Tjj in the offline schedule. This list is managed 
according to the following rules, which are widely inspired 
from ||5l . 

a-Ruie 1 At any time, the a-queue is sorted by decreasing 
order of the job priorities, with the m highest priority jobs 



1 2 3 4 5 6 7 S 9 10 11 12 13 14 15 16 17 18 . 

time 

Figure 1. Offline and actual schedules. 

At run-time, whenever any job is dispatched to any pro- 
cessor Vg in the offline schedule, MORA also dispatches 



ci-Rule 3 Upon arrival of a job Tij at time t, Tij inserts its 
WCET Ci into the a-queue in the correct priority position. 
This happens only once for each arrival, no re-insertion at 
return from preemptions. 
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cv-Rule 4 As time elapses, the m fields rem°j(t) (if any) at 
the head of the a-queue are decreased with a rate propor- 
tional to the offline speeds s°j. Whenever one field reaches 
zero, that element is removed and the update continues, still 
with the m first elements (if any). Obviously, no update is 
performed when the a-queue is empty. 

For the same reasons than those explained in |5 ], the fol- 
lowing observation holds. 

Observation 1 At any time t, the a-queue updated accord- 
ing to a-Rules^^contains only the jobs that would be ac- 
tive at time t in the offline schedule. Moreover, the rem°j(t) 
fields contain the worst-case remaining execution time of 
every active job Tij at time t in the offline schedule. 

By consulting the a-queue at any time t, MORA is able 
to get the required information about any active jobs Tjj in 
the offline schedule, i.e., its worst-case remaining execution 
time rem°j(t), its next dispatching time disp^ j(t) and the 
next job dispatching time ncxtdisp^^, t) on any processor 
Vi- Due to the space limitation, we omitted the implementa- 
tion details about the procedures which compute dispj j(t) 
and nextdisp^f , t). 

Notice that, as explained in [5|, the dynamic reduction 
of rem°®(t) from a-RuleHdoes not need to be performed 
at every clock cycle. Instead, for efficiency, we perform the 
reduction only before MORA modifies a speed, by taking 
into account the time elapsed since the last update. For- 
mally, if At time units elapsed, the m fields at the head 
of the a-queue are updated as follows: rem°^(t + At) «— 
rem°*?(i) — ■ At. The above approach relies on two 
facts: as we will see in the next section, the speed adjust- 
ment decisions will be taken only at job arrival time (i.e., the 
execution speed of the arriving job is set to its offline speed), 
job dispatching time in the offline schedule and whenever a 
processor is about to get idle in the actual schedule. Hence, 
it is necessary to have an accurate a-queue only at these 
instants. Second, between these instants, each task is effec- 
tively executed non-preemptively in the actual schedule. 

3.3. Principle of MORA 

As explained in Section [3J"| whenever a job is dispatched 
in the offline schedule, it is also dispatched in the actual 
one. However, as we will see below, MORA profits from 
an early job completion by starting the execution of some 
other jobs earlier in the actual schedule than in the offline 
one. As a result, when a job (say Tk.i) is dispatched at time 
t in the offline schedule (and thus also in the actual one), 
its worst-case remaining execution time rem^ (t) could be 
lower than remS^(t) if it was executed earlier in the ac- 
tual schedule. For example, Figure [2] depicts the same set 
of tasks than in Figure [T] At time t = 2, T2.1 completes 



in the actual schedule on processor V2 and leaves 4 unused 
time units. These 4 time units are reclaimed by starting the 
execution of T5 j. (we will see below how MORA selects 
the job which profits from the slack time) and therefore, 
when i is dispatched to V2 in the offline schedule at time 
t = 8, it is also dispatched to V2 in the actual one and 
we have rem5 : i(8) < reni5^(8). The difference between 
these remaining execution times is called the earliness of 

the job and we denote it by ek.i(t) = f rem'jf^(t)—remi e ,i(t). 
According to this earliness, whenever any job Tk,t is dis- 
patched in both schedules, its execution speed Sf.£ may 
safely be reduced to s',. , so that rc " lfc,£ = lcmfc,<, n "t £ '' J . 

Indeed, under this speed s' k t , t^j would complete simul- 
taneously in both schedules if it presents its WCET This 
leads to the first rule of MORA. 

Rule 1 Any job n t j which is dispatched to any processor Vi at 
time t in the offline schedule is also dispatched to Ve at time t in 
the actual one and its execution speed Sij is modified according 
to 

~ / rem, a (t) ■ sf, \ 

V rem °iM ) 

The main idea of MORA can be summarized as follows. 
When any job completes in the actual schedule without con- 
suming its WCET, the unused time may be reclaimed by 
starting the execution of any waiting job earlier; and since 
this waiting job receives additional time for its execution, it 
can thereby reduce its execution speed. Using this concept, 
Figure [2] depicts an example of how MORA takes advan- 
tage from an early job completion. When T2,i completes at 
time t = 2 in the actual schedule, MORA selects a wait- 
ing job (here, T51) and executes it during the 4 time units 
left by T2.1. Since T51 is granted to use 4 additional time 
units, MORA reduces its execution speed S51 so that its 
worst-case remaining execution time increases by 4 time 
units. The selected job is the one for which the resulting 
speed reduction leads to the highest energy saving. For- 
mally, MORA selects a waiting job and decreases its execu- 
tion speed as described by Rule [2] 

(a) Offline schedule. 
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(b) Actual schedule. 
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Figure 2. Rules[l]and|2]of MORA. 
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Rule 2 Whenever any processor V r is about to get idle at time t 
in the actual schedule, 

Step 1. Use the a-queue to compute the next dispatching time 
nextdisp('P r , t) on processor V r and proceed to the steps 
2— 5 for every waiting job T%,j at time t in the actual sched- 
ule. 

Step 2. Compute the amount Lij(t) of additional time units that 
Tij could reclaim in the actual schedule if it was dispatched 
at time t, i.e., 

Li,j(t) = f min(nextdisp(7 :, r, i), disp^(t)) — t 
In Figure^we have nextdisp('P2, 2) = 6 and Lij(2) = 4 

Step 3. Compute what would be the resulting execution speed 
s 'ij if T i,j was granted to use both its earliness and these 
Lij additional time units, i.e., s'i ~ is computed so that 

rem, ~(t) rem,- ,(t)4-£i ~ (t) T , , ,. 

^212 = 'oft ' 3 + Li,j, thus leading to 



rem?f(t) + L iJ (i)- S ^ 



Step 4. Estimate what would be the resulting execution speed s"j 
if Ti.j was not granted to use these Lij(t) additional time 
units. According to Rule^ Sij will be modified to s"j when 
Tij will be dispatched in the offline schedule (say at time 
t ), By assuming that Tij will not be executed in the actual 
schedule until time t", we will have rerrtij (t") = rertii.j (t) 
and from Expression^ 



s 



rercnj(t) ■ sf >: 
rem°«(t) 



Step 5. Compute the energy saving AEij between execution at 
speed s"j and at speed s[j: 



AEi. 



Ei 



-Ei 



i,j(t) 



Step 6. Dispatch the job Tk.e with the largest AEk,e to processor 
V r - If AEi j < for all the waiting jobs, then dispatch the 
waiting job Tk,i (if any) with the highest priority in order to 
complete it earlier and to potentially increase the length of 
future slack time. 

Step 7. If there is a selected job Tk,e, set its execution speed Sk,e 
to the computed one s' k t . Otherwise, turn the processor V r 
into the idle mode. 

Notice that, if a processor is about to be idle in the actual 
schedule exactly when a job is dispatched in the offline one, 
only Rule^is applied. Algorithm [T] presents the pseudo- 
code of MORA and we demonstrate its correctness in the 
following section. 



Algorithm 1: MORA 



1 Determine the offline speed s° ^ of every job Tij ; 

2 a-queue * — ; 

At job arrival (say at time t: 

3 Update the a-queue according to q-Rule[4|; 

4 Insert the value of d into the a-queue according to q-Rule[3|; 

5 Set Si j to s° f J ; 

Whenever any processor V r is about to get idle at time t: 

6 Update the a-queue according to a-Rule[4l; 

7 apply Rule[2]; 

Whenever any job n,j is dispatched to any processor V r in the offline 
schedule at time ti 

8 Update the a-queue according to a-Rule|4l; 

9 it (a job Tk f 7^ Tij is running on V r ) fnen Preempt Th ; e ; 
10 apply Rulefi]; 



3.4. Correctness of MORA 

In this section, we formally prove that using MORA does 
not jeopardize the system schedulability. 

Lemma 1 Let S be any preemptive and FJP global 
scheduling algorithm and let r be any set of real-time tasks. 
Suppose that r is scheduled by S while using MORA, and 
at time t during the system execution we have VTjj and 
VO < if < t: 

rem^O') < rem*(t') 

Then, $t' with < t' < t such that ^Ti j running at time 
t' in the offline schedule and waiting at time if in the actual 
one. 

Proof The proof is obtained by contradiction. Suppose that 
at any time t' such that < if < t, 3rij running in the of- 
fline schedule and waiting in the actual one. It implies that 
at time if in the offline schedule, there are at most (m — 1) 
jobs with an higher priority than Tij, whereas there are at 
least m such jobs in the actual one. In other words, there is 
at least one job (say Tk,i) at time t' with an higher priority 
than Ti t j, such that Tk.e is completed in the offline sched- 
ule, and not in the actual one. For this job, it holds that 
rem.k,e(t') > rem^(i'), leading to contradiction with our 
hypothesis. The property follows. M 

Lemma 2 Let S be any preemptive and FJP global 
scheduling algorithm and let r be any set of real-time tasks. 
Suppose that r is scheduled by S while using MORA, and 
at time t during the system execution we have Vt^j and 
VO < if < t: 

rem itj (t') < rem?f (t') 

Then, $t' with < if < t such that ^Ti j running at time t' 
in the offline schedule and such that the last speed modifi- 
cation of Tij was performed according to Rule^ 
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Proof The proof is obtained by contradiction. Suppose that 
at time t' with < if < t, 3tj j running at time if in the of- 
fline schedule and such that the last modification ofsij was 
performed according to Rule [2] Let tactual ond t g be the 
largest instants before time if at which Tij was dispatched 
in the actual and offline schedule, respectively. Notice that 
the case where Ti^j is not dispatched before time if in the ac- 
tual schedule leads to a contradiction of Lemma^ There- 
fore, only two cases may arise: (i) tactual < toS> ' n this 
case Si j would have been modified at time t Q g according to 
Rule^ leading to a contradiction of our hypothesis, or (ii) 
^actual > t ff, which leads to a contradiction of Lemma^ 
The property follows. I 

Theorem 1 Let S be any preemptive and FJP global 
scheduling algorithm and let r be any set of real-time tasks 
which is schedulable by S when every job Tij is executed 
at its offline speed s°*J. Then, every job deadline is still met 
when the system is scheduled by S while using MORA 

Proof The proof consists in showing that Vr^ j we have 

rem itj (dij) < rem°j(d l:J ) (2) 

while using MORA Indeed, since the offline schedule 
meets all the deadlines, we have Tem°j(dij) = Vr^j. 
Therefore, having rem.ij(dij) < Tem°j (dij) leads to 
reiXLij^dij) = Vt^ meaning that the actual schedule 
also meets all the deadlines. 

Initially at time t — 0, we obviously have reirijj^O) = 
rem°*(0) Vrij. Now, let t > be any instant and suppose 
that^Tij and\/Q < if < t we have remj^t') < rem°^(t'). 
We prove in the following that it yields 

rem j j (next (t)) < rem?*- (next (t)) Vr^ (3) 

where next(t) denotes the earliest instant after time t such 
that one of the following events occurs: arrival of a job, 
deadline of a job, completion of a job in the actual schedule 
or in the offline schedule, dispatching of a job in the actual 
schedule or in the offline schedule. Obviously if Inequality^ 
holds then Inequality^also holds since next(i) can denote 
every job deadline. 

From the definition o/next(t), every processor of both 
schedules is either idle or it executes one and only one job 
during any time interval [i, next(t)]. In other words, the 
state (waiting or running) of any active jobs in any schedule 
does not change during any time interval [t, next(t)]. As a 
result, the following relations hold at time t: 

• For any waiting job Tjj in the actual schedule: 

rern^ j(next(t)) = remjj-(t) (4) 

• For any waiting job T^ j in the offline schedule: 

rem?? (next (*)) = remf^t) (5) 



• For any running job Ti t j in the actual schedule: 

renijj (next(i)) < rem^t) (6) 

The first part of the proof shows that Inequality^holds 
for every waiting job at time t in the actual schedule and the 
second part shows that it also holds for every running job 
at time t in the actual schedule. 

Part 1. Let r k ^ be any waiting job at time t in the ac- 
tual schedule. From Lemma |7J we know that tu,i is also 
waiting at time t in the offline one and since by hypothesis 
remk.tft) < rem^(t), we know from Equalities^and^ 
that remfe i £(next(t)) < rem^(next(t)). The property fol- 
lows. 

Part 2. Let T k ,t be any running job at time t in the actual 
schedule. Regarding its execution speed Sk,i, only two cases 
may occur: its last modification was performed by Rule [7J 
(case 1) or by Rule^fcase 2). 

Case 1. Tfe i is running at time t in the actual sched- 
ule and the last modification of Sk,e was performed ac- 
cording to Rule [7J when it was dispatched in the offline 
schedule (say at time t g < t). By hypothesis, we have 
remfe^(i ff) < rem^(t ff) and we know that Tk,t is exe- 
cuted non-preemptively in both schedules during the time 
interval [t s,i\. Indeed, if it was preempted in the offline 
schedule, it would have been also preempted in the actual 
one according to Rule [7J However in the actual schedule, 
Tk,e is running at times t ff and t. Therefore, its speed would 
have been modified according to Rule[2\at its re-dispatching 
time if it was preempted during [t fj, t\7As a result, from our 
interpretation of the processor speed we get 

rem fc ,£(next(t)) = rem k ,i(t a) — s k ,i ■ (next(t) — toff) (7) 

and 

rem£^(next(t)) = rem^ioff) - s° k a t ■ (next(t) - t oft ) (8) 

After the speed modification by RuleU\at time t g, we know 
from ExpressionQthat Sk,i = rcm°'|(tff) ' s< kt and Equal- 
ity^can be rewritten as 

rom fc-( .(ncxt(t)) = rcm M (t off ) -^r— • s h , • (next(t) - t off ) 

rcm fc" (*off) 

Finally, notice that multiplying the right-hand side of 
the above Equality by leads to the right-hand 

side of Equality^ Since by hypothesis remfc^(t fj) < 

rem£*(i ff), we have re^'f (*"))) — ^ ana * therefore 
remfe i £(next(t)) < rem?.* (next (t)). The property follows. 
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Case 2. Tk,i is running at time t in the actual schedule 
and the last modification of Sk.i was performed according 
to Rule^ Therefore, we know from Lemma^that Tk,i is 
waiting at time t in the offline schedule, and since by hy- 
pothesis remfc^(t) < rem^(t), we know from Equalities^ 
and 6 \that renifc^(next(i)) < rcm^(next(<)). The theo- 
remfollows. I 

4. Simulation results 

In this section, we compare the effectiveness of MORA 
with other energy-aware algorithms. However, it is mean- 
ingful to only compare MORA with approaches that con- 
sider the same models of computation and the most related 
paper to ours is G6ll . where two methods with the same 
task and platform model are proposed. However, these two 
methods do not take into account the application-specific 
parameter ej of task Tj. The first method proposed in [26 1 
(that we denote by OFF hereafter) is an offline speed deter- 
mination technique for Global-EDF which determines an 
unique and constant speed s° s for all the processors such 
that all the job deadlines are met under this speed. In our 
simulations, this OFF method is used by MORA in order 
to provide the offline speed sf* of every job t,*^-, i.e., s is 
determined at line 1 of Algorithm[T|and s°*J is set to ,s off be- 
tween lines 4 and 5. The second method proposed in ll26ll is 
the MOTE algorithm. At run-time, it anticipates the coming 
idle instants in the schedule and adjusts the speed of the pro- 
cessors accordingly, i.e., it reduces the processors speed in 
order to minimize the proportion of time during which the 
system is idle. Since this algorithm is also based on the con- 
cept of the offline speeds, we consider that OFF is also used 
to provide it. Although MORA could also be compared with 
frame-based scheduling algorithms (since the sporadic task 
model is a generalization of the frame-based task model), 
we do not perform such comparisons in this paper. 

In our simulations, we schedule periodic implicit- 
deadline systems (i.e., Vt,;, Tj is here the exact inter-arrival 
delay between successive jobs and Di = Tj). The energy 
consumption of each generated system is computed by sim- 
ulating three methods: MOTE, MORA and MORAOTE, 
i.e., a combination of the MOTE and MORA. Indeed, 
since these algorithms do not interfere with each other, the 
MOTE rule can be applied on the offline speeds just be- 
fore applying Rule^of MORA (i.e., between lines 9 and 
10 of Algorithm[T]). Although the implementation details of 
MORAOTE are omitted here due to the space limitation, we 
will see in our simulation results that this combination al- 
ways improves the provided energy savings. The consump- 
tions provided by these three methods are compared with 
the consumption of the MAX method, where all the jobs are 
executed at the maximal processors speed s max = 1. That 
is, we consider that the consumption by MAX is 100% and 



the consumptions of the other methods are normalized. 

In every simulation, we generated 100 set of tasks with 
a total density (5 sum (r) within [d, d + 0.05) where d = 
0, 0.05, . . . , 9.95, leading to an amount of 20000 generated 
task sets for each simulation. The upper bound on <5 sum (r) 
(i.e., 10) was chosen in order to cover a large number of 
systems while keeping the simulation time reasonable. For 
a given total density, tasks densities Sj are uniformly gen- 
erated within [0.01, -D max ] until the total density 5 suii1 (t) 
reaches the expected one (the upper bound £> max on the 
tasks density will be discussed later). Notice that the num- 
ber n of tasks is not fixed beforehand, i.e., it depends on this 
step that generates task densities. Next, other task param- 
eters Cj, Di and Tj are randomly generated according to 
their respective density 6j. Finally, the application-specific 
parameters ei are uniformly chosen in [0.8, 1.2] so that the 
consumption of the tasks varies between 80% and 120% of 
the power of the measured benchmark. 

Once a set of tasks is generated, it is executed during 
100 hyper-periods (i.e. the least common multiple of the 
task periods) by the four methods MAX, MOTE, MORA 
and MORAOTE. This upper bound on systems execution 
time was chosen to ensure that every task generates at least 
100 jobs (for the same reason as those mentioned above). 
During each system execution, the actual execution time 
of every job Tjj is uniformly generated in [f^,Ci]- This 
lower bound S was chosen in order to reflect the fact that 
a job may take up to 10 times less than its WCET. Finally, 
for every generated task set r, the number m of processors 
must be sufficient to schedule t by MAX without missing 
any deadline. Hence, we set m to the lowest integer that 
passes one of the following EDF-schedulability tests: the 
density-based test lfT9l . the load-based test 1 12 1 and the test 
denoted Test 13 in [8|. Simulations were performed while 
considering different scheduling algorithms (Global-EDF 
and Global-DM) and various processor models. However, 
due to the space limitation, we only depict in this paper the 
results provided by Global-EDF on Intel XScale processors 
(outlined in Table 1 page [2]). 

Observation 2 The effectiveness of both MORA and 
MOTE mainly relies on the ratio — , but antagonistically. 

This observation stems from the fact that MORA saves 
energy via the waiting jobs whereas MOTE profits from the 
absence of waiting jobs. When — tends to 1, jobs tend to 
never wait for a free processor and MOTE therefore pro- 
vides significant energy savings whereas the effectiveness 
of MORA is almost null. On the other hand when — tends 
to 0, processors tend to consecutively execute several dis- 
tinct jobs and jobs are often waiting. As a result, MORA 
is often able to reclaim unused time and provides impor- 
tant energy savings whereas the effectiveness of MOTE is 
negligible. 
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According to our task generation process, we are not able 
to directly set the ratio — to any given value. However, the 
number rn of processors is obtained by using a combination 
of sufficient schedulability tests and the accuracy of these 
tests mainly relies on <5 max (r). Basically, the ratio — in- 
creases as <5 max (T) becomes larger and since the generated 
task sets are more likely to have a large (5 max (T) when the 
upper bound Z? max is high, we can indirectly control the ra- 
tio — via I? m ax- The Y-axis of Figure 3 represents the ratio 
— obtained from the used schedulability tests when <5 sum (Y) 
varies within [0, 10] and D max varies within [0.1, 1] with a 
step of 0.1. 

For every D max multiple of 0.1 within [0.1, 1], 20000 
set of tasks were generated by the generation process de- 
scribed above and the resulting average consumptions of 
the MOTE, MORA and MORAOTE are depicted in Fig- 
ure 4. The Y-axis is the average energy consumption of ev- 
ery method compared with the MAX method (in %) and the 
X-axis is the corresponding value of D max during the sim- 
ulation. As we can see, Figures 3 and 4 clearly corroborate 
Observation [5] Moreover, Figure 4 shows that MORA can 
save up to 32% of energy (in average) over the MAX method 
(for Z? max = 0.1) and the algorithm MORAOTE provides 
important energy savings for various values of -D max . No- 
tice that a part of the energy savings is explained by the 
use of the OFF method, which leads MOTE and MORA to 
an energy savings of about 10% when — tends to and 1, 
respectively. Furthermore, although other processor mod- 
els and scheduling algorithms led to different average con- 
sumptions, the evolution of the consumption with respect to 
-Dmax remains similar than in Figure 4. 




0- 
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Total density 



Figure 3. Ratio ^ for different values of D, 




Figure 4. Average consumptions of MOTE, 
MORA and MORAOTE for various values of 
D max under Global-EDF on Intel XScale pro- 
cessors. 



5. Conclusion 

In this paper, we propose a slack reclamation scheme 
called MORA which reduces the energy consumption while 
scheduling a set of sporadic constrained-deadline tasks by a 
global, preemptive and FJP algorithm on a fixed number of 
DVFS-identical processors. According to iTTSIl and to the 
best of our knowledge, we are the firsts to address such 
approach in this context. The proposed algorithm MORA 
exploits early job completions at run- time by starting the 
execution of the next waiting jobs at a lower speed. Com- 
pared with other reclaiming algorithms such that the DRA 
proposed in [5], MORA takes into account the application- 
specific consumption profile of the tasks in order to improve 
the energy saving that it provides. Moreover, we proved that 
using MORA does not jeopardize the system schedulability 
and we show in our simulations that it can save up to 32% 
of energy (in average) compared to execution without using 
any energy-aware algorithm. 

In our future works, we aim to specialize MORA so that 
it will take into account more practical constraints such that 



preemption costs, migration costs and time overheads due 
to the multiple frequency switching. Moreover, we aim to 
extend our processor model in order to handle the various 
idle and sleep modes of the processors and to take into ac- 
count the energy costs due to frequency switching. In other 
future works, we also aim to propose a new multiprocessor 
reclamation scheme which anticipates the early completion 
of jobs for further reducing the CPU speed. This approach 
will be based on statistical informations about tasks that are 
assumed to be known a priori. Some Mn/processor energy- 
aware algorithms already exploit this concept (see the AGR 
algorithm proposed in [5 1 for instance). 

References 

[1] Intel XScale Microarchitecture: Benchmarks, 2005. 
http://web.archive.org/web/20050326232506/developer.in- 
tel.com/design/intelxscale/benchmarks.htm. 

[2] T. A. AlEnawy and H. Aydin. Energy-aware task alloca- 
tion for rate monotonic scheduling. In Proceedings of the 



9 



11th IEEE Real-time and Embedded Technology and Appli- 
cations Symposium, pages 213-223, 2005. 

[3] J. Anderson and S. Baruah. Energy-efficient synthesis of 
EDF-scheduled multiprocessor real-time systems. Interna- 
tional Journal of Embedded Systems, 4(1), 2008. 

[4] H. Aydin, R. Melhem, D. Mosse, and P. Mejia-Alvarez. De- 
termining optimal processor speeds for periodic real-time 
tasks with different power characteristics. In Proceedings 
of the IEEE EuroMicro Conference on Real-Time Systems, 
pages 225-232, 2001. 

[5] H. Aydin, R. Melhem, D. Mosse, and P. Mejfa-Alvarez. 
Power-aware scheduling for periodic real-time tasks. IEEE 
Transactions on Computers, 53(5):584-600, 2004. 

[6] H. Aydin and Q. Yang. Energy-aware partitioning for mul- 
tiprocessor real-time systems. In Proceedings of 17th In- 
ternational Parallel and Distributed Processing Symposium, 
pages 113-121,2003. 

[7] T. Baker. Multiprocessor EDF and deadline monotonic 
schedulability analisys. In Proceedings of the 24th IEEE In- 
ternational Real-Time Systems Symposium, pages 120-129, 
December 2003. 

[8] T. Baker and S. Baruah. Schedulability Analysis of Mul- 
tiprocessor Sporadic Task Systems. In Handbook of Real- 
Time and Embedded Systems, Sang H. Son, Insup Lee, and 
Joseph Y-T Leung (eds). Chapman Hall/ CRC Press, De- 
cember 2006. 

[9] N. Bansal, T. Kimbrel, and K. Pruhs. Dynamic speed scal- 
ing to manage energy and temperature. In Proceedings of 
the Symposium on Foundations of Computer Science, pages 
520-529, 2004. 

[10] S. Baruah and J. Anderson. Energy-aware implementation 
of hard-real-time systems upon multiprocessor platform. In 

Proceedings of the 16th International Conference on Par- 
allel and Distributed Computing Systems, pages 430-435, 
August 2003. 

[11] S. Baruah and J. Anderson. Energy-efficient synthesis of pe- 
riodic task systems upon identical multiprocessor platforms. 
In Proceedings of the 24th International Conference on Dis- 
tributed Computing Systems, pages 428^135, Tokyo, Japan, 
March 2004. IEEE Computer Society Press. 

[12] S. Baruah and T. Baker. Schedulability analysis of global 
EDF. Real Time Systems, Accepted for publication, 2008. 

[13] V. Berten and J. Goossens. Multiprocessor global schedul- 
ing on frame-based DVFS systems. In I. Puaut, editor, The 
29th IEEE Real-Time Systems Symposium, WiP proceedings, 
pages 21-24, 2008. 

[14] J.-J. Chen, H.-R. Hsu, K.-H. Chuang, C.-L. Yang, A.- 
C. Pang, and T.-W. Kuo. Multiprocessor energy-efficient 
scheduling with task migration considerations. In Proceed- 
ings of the 16th Euromicro Conference on Real-Time Sys- 
tems (ECRTS'04), pages 101-108, 2004. 

[15] J. J. Chen and C. Kuo. Energy-efficient scheduling for 
real-time systems on dynamic voltage scaling (DVS) plat- 
forms. In the 13th IEEE International Conference on Em- 
bedded and Real-Time Computing Systems and Applications 
(RTCSA), pages 28-38, August 21-24 2007. 

[16] J.-J. Chen and T.-W. Kuo. Multiprocessor energy-efficient 
scheduling for real-time tasks. In International Conference 
on Parallel Processing, pages 13-20, 2005. 



[17] J.-J. Chen, C.-Y. Yang, and T.-W. Kuo. Slack reclamation 
for real-time task scheduling over dynamic voltage scaling 
multiprocessors. In IEEE International Conference on Sen- 
sor Networks, Ubiquitous, and Trustworthy Computing, vol- 
ume 1, pages 358-367, June 2006. 

[18] J.-J. Chen, C.-Y. Yang, and T.-W. Kuo. Slack reclamation for 
real-time task scheduling over dynamic voltage scaling mul- 
tiprocessors. In IEEE International Conference on Sensor 
Networks, Ubiquitous, and Trustworthy Computing (SUTC), 
Taichung, Taiwan, 2006. 

[19] J. Goossens, S. Funk, and S. Baruah. Priority-driven 
scheduling of periodic task systems on uniform multipro- 
cessors. Real Time Systems, 25:187-205, 2003. 

[20] http://www.pdadb.net. 

[21] Intel. Intel PXA27x Processor Family, Design guide, May 
2005. 

[22] S. Irani, S. Shukla, and R. Gupta. Algorithms for power sav- 
ings. In Proceedings of the 14th Annual ACM-S1AM Sympo- 
sium on Discrete Algorithms, pages 37-46, 2003. 

[23] T. Ishihara and H. Yasuura. Voltage scheduling problem for 
dynamically variable voltage processors. In International 
Symposium on Low Power Electronics and Design, pages 
197-202, 1998. 

[24] G. Magklis, G. Semeraro, D. H. Albonesi, S. G. Dropsho, 
S. Dwarkadas, and M. L. Scott. Dynamic frequency and 
voltage scaling for a multiple-clock-domain microprocessor. 
In IEEE Micro, volume 23, pages 62-68, 2003. 

[25] P. Mejfa-Alvarez, E. Levner, and D. Mosse. Adaptive 
scheduling server for power-aware real-time tasks. ACM 
Transactions on Embedded Computing Systems, 2(3):284- 
306, 2004. 

[26] V. Nelis, J. Goossens, N. Navet, R. Devillers, and D. Miloje- 
vic. Power-aware real-time scheduling upon identical mul- 
tiprocessor platforms. In IEEE International Conference on 
Sensor Networks Ubiquitous and Trustworthy Computing, 
pages 209-216, June 2008. 

[27] P. Pillai and K. Shin. Real-time dynamic voltage scaling 
for low powered embedded systems. Operating Systems Re- 
view, 35:89-102, October 2001. 

[28] Y. Shin and K. Choi. Power conscious fixed priority schedul- 
ing for hard real-time systems. In Design Automation Con- 
ference, pages 134-139, 1999. 

[29] E. Talpes and D. Marculescu. Toward a multiple 
clock/voltage island design style for power-aware proces- 
sors. In IEEE Trans. Very Large Scale lntegr. (VLSI) Syst., 
volume 13, pages 591-603, 2005. 

[30] R. Xu, R. Melhem, and D. Mosse. A unified practical ap- 
proach to stochastic DVS scheduling. In EMSOFT, pages 
37-46,2007. 

[31] C.-Y. Yang, J.-J. Chen, and T.-W. Kuo. An approximation 
algorithm for energy-efficient scheduling on a chip multi- 
processor. In Proceedings of the 8th Conference of Design, 
Automation, and Test, pages 468-473, 2005. 

[32] F. Yao, A. Demers, and S. Shenker. A scheduling model 
for reduced CPU energy. In Proceedings of the 36th IEEE 
Annual Foundations of Computer Science, pages 374-382, 
1995. 

[33] F. Zhang and S. Chanson. Processor voltage scheduling for 
real-time tasks with non-preemptible sections. In 23th Real- 
Time Systems Symposium, pages 235-245, 2002. 



10 



[34] D. Zhu, R. Melhem, and B. Childers. Scheduling with dy- 
namic voltage/speed adjustment using slack reclamation in 
multi-processor real-time systems. In Proceedings of IEEE 
22th Real-Time System Symposium, pages 84-94, 2001. 

[35] J. Zhuo and C. Chakrabarti. System-level energy-efficient 
dynamic task scheduling. In ACM/IEEE Design Automation 
Conference, pages 628-631, June 2005. 



11 



